Exploratory Data Analysis of Meteorological Parameters¶

Github Linkedin My Blog

This notebook deals with exploratory data analysis (EDA) of various meteorological parameters viz temperature, relative humidity, wind speed, sunshine hours, and solar radiation along with estimated reference evapotranspiration.

Table of Content¶

  • 1. Packages
  • 2. Meteorological Data
  • 3. Data Exploration
  • 4. Important Features
  • 5. Trend Analysis
    • 5.1 Trend analysis (monthly average)
    • 5.2 Trend analysis (annual average)

1. Packages¶

In [1]:
import numpy as np # To create and deal with arrays
import pandas as pd # To manage and create various datasets
import matplotlib.pyplot as plt # To plot graphs
import seaborn as sns # To plot scientific graphs
import pymannkendall as mk # To find out trend in time series parameters viz temperature, evapotranspiration

2. Meteorological Data¶

In [2]:
df = pd.read_excel("climate.xlsx")
In [3]:
df
Out[3]:
J Tmax Tmin RHmax RHmin Wind_speed n n/N Rns VPD ETo
0 1 15.8 1.0 96 57.0 0.256 6.5 0.646 8.647 0.399 1.026
1 2 20.4 0.3 91 31.0 0.256 8.0 0.795 9.794 0.855 1.166
2 3 15.0 5.2 91 64.0 0.614 0.5 0.050 4.169 0.347 0.953
3 4 20.2 5.4 97 46.0 0.460 8.8 0.873 10.445 0.653 1.419
4 5 21.4 4.8 94 48.0 0.486 8.5 0.842 10.246 0.688 1.464
... ... ... ... ... ... ... ... ... ... ... ...
9850 361 19.6 4.8 97 50.0 0.384 7.3 0.728 9.171 0.583 1.248
9851 362 20.2 5.0 97 51.0 0.614 8.3 0.827 9.929 0.593 1.432
9852 363 18.8 4.6 97 45.0 0.563 8.6 0.857 10.168 0.610 1.381
9853 364 19.2 4.0 97 45.0 0.230 5.0 0.498 7.494 0.624 1.067
9854 365 19.2 4.8 97 44.0 0.563 7.4 0.736 9.306 0.636 1.369

9855 rows × 11 columns

The dataset contains total 11 columns which represents 10 features and 1 label (ETo). The indices corresponding to different features are represents in df.info() which are required further in this notebook. There are total 9855 rows which reprensts 27 years data from 1995-2021.

In [4]:
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9855 entries, 0 to 9854
Data columns (total 11 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   J           9855 non-null   int64  
 1   Tmax        9855 non-null   float64
 2   Tmin        9855 non-null   float64
 3   RHmax       9855 non-null   int64  
 4   RHmin       9855 non-null   float64
 5   Wind_speed  9855 non-null   float64
 6   n           9855 non-null   float64
 7   n/N         9855 non-null   float64
 8   Rns         9855 non-null   float64
 9   VPD         9855 non-null   float64
 10  ETo         9855 non-null   float64
dtypes: float64(9), int64(2)
memory usage: 847.0 KB

J = Julian days
Tmax = Maximum Temperature in celcius
Tmin = Minimum Temperature in celcius
RHmax = Maximum Relative Humidity in percentage
RHmin = Minimum Relative Humidity in percentage
Wind_speed = Wind Speed in meter per seconds
n = Actual sunshine hours
n/N = Ratio of Actual and Maximum possible sunshine hours
Rns = Net Solar Radiation Mj/$m^2$/day
VPD = Vapour Pressure Deficit in kPa
ETo = Reference Evapotranspiration in mm

3. Data Exploration¶

In this section, initial data analysis is carried out to analyse how varies features and label varies with time and their minimum, maximum and mean values over the time.

In [5]:
df.describe()
Out[5]:
J Tmax Tmin RHmax RHmin Wind_speed n n/N Rns VPD ETo
count 9855.000000 9855.000000 9855.000000 9855.000000 9855.000000 9855.000000 9855.000000 9855.000000 9855.000000 9855.000000 9855.000000
mean 183.000000 29.813807 17.490939 83.573617 48.862253 1.059630 7.438709 0.619921 13.509078 1.449226 3.767591
std 105.371375 7.419522 8.128256 15.713658 19.592271 0.659309 3.658271 0.298696 4.931936 1.022751 1.991188
min 1.000000 5.200000 -1.600000 22.000000 3.000000 0.000000 0.000000 0.000000 3.713000 0.000000 0.580000
25% 92.000000 24.000000 10.200000 78.000000 34.000000 0.614000 5.100000 0.423000 9.871000 0.728000 2.027000
50% 183.000000 31.400000 18.000000 89.000000 47.000000 0.895000 8.400000 0.720000 13.432000 1.187000 3.556000
75% 274.000000 35.000000 25.000000 95.000000 62.000000 1.356000 10.200000 0.861000 17.484500 1.814000 5.221000
max 365.000000 46.600000 44.400000 100.000000 100.000000 7.419000 20.000000 1.566000 29.889000 5.692000 10.447000
In [6]:
fig, ax = plt.subplots(5,1, figsize=(14,16), dpi=250)

x = 365 # no. of days in year
# A function is defined to draw a vertical line after 365 points i.e. after a year for better understanding of parameters
for cor in range(27): # no. of years in current dataset
    for i in range(5): # for all five subplots
        ax[i].axvline(x=x, color='k')
    x = x+365 
    
ax[0].set_title('Daily Max and Min temperature from 1995 to 2021')
ax[0].plot(df.index,df['Tmax'])
ax[0].plot(df.index,df['Tmin'])
ax[0].axhline(df['Tmax'].mean(), color = 'navy') # A horizontal line for mean maximum temperature
ax[0].axhline(df['Tmin'].mean(), color = 'orange') # A horizontal line for mean minimum temperature
ax[0].axhline(df['Tmax'].max(), ls = '--') # A horizontal line for uppermost value maximum temperature
ax[0].axhline(df['Tmin'].min(), ls = '--') # A horizontal line for lowermost value minimum temperature

ax[1].set_title('Daily Avg temperature from 1995 to 2021')
ax[1].plot(df.index,df[['Tmax','Tmin']].mean(axis=1), 'r')
#ax[1].plot(df[['Tmax','Tmin']].mean(axis=1).rolling(7).mean(),'pink')

ax[2].set_title('Daily Max and Min RH from 1995 to 2021')
ax[2].plot(df.index,df['RHmax'],'g')
ax[2].plot(df.index,df['RHmin'],'y')
ax[2].axhline(df['RHmax'].mean(), color = 'blue')
ax[2].axhline(df['RHmin'].mean(), color = 'green')

ax[3].set_title('Daily Avg RH from 1995 to 2021')
ax[3].plot(df.index,df[['RHmax','RHmin']].mean(axis=1), 'grey')
#ax[3].plot(df[['RHmax','RHmin']].mean(axis=1).rolling(7).mean(), 'pink')

ax[4].set_title('Daily ET from 1995 to 2021')
ax[4].plot(df.index,df['ETo'])
ax[4].axhline(df['ETo'].mean())
ax[4].axhline(df['ETo'].max(), ls = '--')
ax[4].axhline(df['ETo'].min(), ls = '--')

plt.tight_layout()
In [7]:
plt.figure(figsize=(12,8), dpi=200)
sns.heatmap(df.drop('J',axis =1).corr(), linewidth=0.5, annot=True, cmap='viridis');
In [8]:
sns.pairplot(df.drop('J',axis =1),
             kind ="reg", plot_kws={'line_kws':{'color':'tomato'}, 'color':'darkcyan'}, 
             diag_kws={'color':'tomato'}, corner=True)
Out[8]:
<seaborn.axisgrid.PairGrid at 0x1dcd6df21f0>

4. Important Features¶

Adaptive boosting aka Adaboost which is a machine learning technique is employed to extract most important features to estimate reasonable accurate value of label which is ETo in our case.

In [9]:
X = df.drop(['J', 'ETo'], axis =1) # All general features used to estimate ETo are seperated and referred to X
In [10]:
y = df['ETo'] # label referred to y
In [11]:
from sklearn.model_selection import train_test_split # Dataset is splited into train (85%) and test (15%) data.
In [12]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.15, random_state=101)
In [13]:
from sklearn.ensemble import AdaBoostRegressor

An instance of Adaboost with one estimater and learning rate of 0.8 is created. Here in this, the base estimater is decision tree with one level.

In [14]:
feature1 = AdaBoostRegressor(n_estimators=1, learning_rate=0.8, random_state=101) 
In [15]:
feature1.fit(X_train,y_train)
Out[15]:
AdaBoostRegressor(learning_rate=0.8, n_estimators=1, random_state=101)
In [16]:
from sklearn.metrics import r2_score # R-square is used to analyze the performance of model
In [17]:
preds = feature1.predict(X_test)
In [18]:
print('R2 Value:',r2_score(y_test, preds))
R2 Value: 0.8594273982697587
In [19]:
feature1.feature_importances_
Out[19]:
array([0.8443255 , 0.        , 0.        , 0.        , 0.02269728,
       0.        , 0.        , 0.13297722, 0.        ])
In [20]:
feats = pd.DataFrame(index=X.columns,data=feature1.feature_importances_,columns=['Importance'])
In [21]:
imp_feats = feats[feats['Importance']>0].sort_values("Importance")
In [22]:
plt.figure(figsize=(4,3),dpi=100)
sns.barplot(data=imp_feats,x=imp_feats.index,y='Importance')
Out[22]:
<AxesSubplot:ylabel='Importance'>

With one estimater, the maximum temperature turns out to be most important feature to estimate ETo followed by Rns and wind speed. It shows that only with these three features, ETo can be estimated fairly with R-square value of 0.86. Now, lets find out the optimum number of estimaters and corresponding number of features to estimate ETo with even higher accuracy.

In [23]:
error_rates = []
r2 = []

for n in range(1,20):
    
    model = AdaBoostRegressor(n_estimators=n, random_state=33)
    model.fit(X_train,y_train)
    preds = model.predict(X_test)
    err = (np.mean(np.abs((y_test - preds) / y_test)))
    r2sc = r2_score(y_test, preds)
    
    error_rates.append(err)
    r2.append(r2sc)
fig,ax =plt.subplots(1,2, figsize=(10,6), dpi=100)
ax[0].plot(range(1,20),error_rates)
ax[1].plot(range(1,20),r2)
ax[0].axvline(x=11, color='r', linestyle='--')
ax[1].axvline(x=11, color='r', linestyle='--')
ax[0].set_title('Variation in error with number of estimators')
ax[0].set_xlabel('no. of estimators')
ax[0].set_ylabel('error')
ax[1].set_title('Variation in R-square with number of estimators')
ax[1].set_xlabel('no. of estimators')
ax[1].set_ylabel('R-square')
plt.tight_layout();

From the plots above, It can be depicted that minimum error and compartively high value of R-square can be achieved with 11 number of estimaters.

In [24]:
feature2 = AdaBoostRegressor(n_estimators=11, random_state=33)
In [25]:
feature2.fit(X_train,y_train)
Out[25]:
AdaBoostRegressor(n_estimators=11, random_state=33)
In [26]:
pred = feature2.predict(X_test)
In [27]:
print('R2 Value:',r2_score(y_test, pred))
R2 Value: 0.9421388992848636
In [28]:
feature2.feature_importances_
Out[28]:
array([0.35246599, 0.01852268, 0.07022322, 0.0026595 , 0.08157826,
       0.        , 0.        , 0.28279395, 0.19175641])
In [29]:
feats2 = pd.DataFrame(index=X.columns,data=feature2.feature_importances_,columns=['Importance'])
In [30]:
imp_feats2 = feats2[feats2['Importance']>0].sort_values("Importance")
In [31]:
plt.figure(figsize=(4,3),dpi=100)
sns.barplot(data=imp_feats2,x=imp_feats2.index,y='Importance')
plt.xticks(rotation=90);

With 11 number of estimaters, R-square more than 0.94 can be achieved with these 7 features where Tmax contributes the most and RHmin least. Furthermore, we will see how top three features relates with ETo.

In [32]:
df.columns
Out[32]:
Index(['J', 'Tmax', 'Tmin', 'RHmax', 'RHmin', 'Wind_speed', 'n', 'n/N', 'Rns',
       'VPD', 'ETo'],
      dtype='object')
In [33]:
for i in ['Tmax','Rns','VPD']:
    plt.figure(figsize=(12,8), dpi=150)
    sns.jointplot(x=i,y='ETo',data=df);
<Figure size 1800x1200 with 0 Axes>
<Figure size 1800x1200 with 0 Axes>
<Figure size 1800x1200 with 0 Axes>

5. Trend Analysis¶

To analyze the trend of features and label with time, Mann-kendall method is adopted to observe increasind, decreasing or no trend in the dataset.

To do this analysis we need dataframe which represent each year separately for each month and feature. so for ease, I created two function to manage data. These function basically converted the dataframe into a three-dimensional array and we can create needful dataframe using numpy sclicing and pandas dataframe function.

annual_data(): In this function, the required arguments are shape and df. Shape is basically the shape of 3d array which in this notebook is [365, 11, 27] where 365 is number of julian days in a year on axis 0, 11 is features and label on axis 1, and years on axis 3. Therefore a 3d array will be created with days on y-axis, features and label on x-axis, and years on z-axis or depthwise axis ( you can imagine years are stacked one after another).

month_data_avg(): In this function, monthly average is calculated for each column (features and label) and a 3d array will be defined which takes shape and m as arguments where shape represents years, columns, and months on axis 0, 1, and 2, respectively.

In [34]:
def annual_data(shape=None, df = df):
    temp = np.array(df)
    annual = np.zeros((shape))
    for i in range(shape[2]):
        annual[:,:,i] = temp[i*shape[0]:((i+1)*shape[0])]
    return annual
In [35]:
shape_annual = [365, 11, (2021-1995+1)] # As 365 julian days on dim[0], # features on dim[1] and # years on dim[2]
In [36]:
annual = annual_data(shape = shape_annual, df = df)
In [37]:
m = [0,31,59,90,120,151,181,212,243,273,304,334,365]
shape_monthly = [27,11,12] # dim0 : # years dim1: # features, dim2: # months
In [38]:
def month_data_avg(sh_mon =None, m = None):
    temp = annual_data(shape = shape_annual, df = df)
    monthly = np.zeros((sh_mon))
    for i in range(sh_mon[0]):
        k, l = 0, 1
        for j in range(sh_mon[2]):
            monthly[i,:,j] = np.average(temp[m[k]:m[l],:,i], axis = 0)
            k +=1
            l+=1
    return monthly
In [39]:
monthly = month_data_avg(shape_monthly,m)
In [40]:
f = monthly[:,10,:] #calling slice of all years(27) for ET (avg) (as 10 index in second dim) for all months
In [41]:
months = ["Jan", "Feb", "March", "April", "May", "June", "July", "Aug", "Sept", "Oct", "Nov", "Dec"]
In [42]:
df_f = pd.DataFrame(f, index = list(range(1995,2022)), columns = months)
In [43]:
df_f
Out[43]:
Jan Feb March April May June July Aug Sept Oct Nov Dec
1995 1.497742 2.143464 3.396290 5.337300 6.980581 7.414033 5.099871 3.517323 4.299800 3.401258 2.324367 1.617935
1996 1.563968 2.141536 3.512129 5.549233 6.801742 6.889333 5.295129 3.611484 4.291467 3.304258 2.353533 1.702581
1997 1.497194 2.537179 3.565742 5.150133 7.168677 5.891700 4.741710 4.110839 4.520833 2.723710 1.872300 0.983645
1998 1.546097 2.322321 3.332742 5.549800 7.017452 6.311700 4.819516 4.608226 3.957567 3.050290 2.150533 1.227355
1999 1.217806 2.310321 3.825226 5.927833 6.870968 6.357833 5.045548 4.616387 4.358367 3.247903 2.276100 1.374645
2000 1.475387 2.097036 3.790258 5.481900 6.665968 5.941967 4.373387 5.007710 4.386567 3.175226 1.855967 1.452677
2001 1.403065 2.478000 3.734387 5.234733 7.030161 5.463800 4.321839 4.794935 4.475767 2.887613 2.139167 1.391968
2002 1.499355 2.273179 3.749226 5.515867 6.910710 6.602633 5.585710 4.757097 4.060200 3.147774 1.971067 1.415677
2003 1.078419 2.234750 3.416839 5.455533 6.490097 6.140200 4.678548 4.467258 4.211300 3.185581 1.976533 1.377806
2004 1.251839 2.538464 4.011258 5.785400 6.607871 6.079933 5.732806 4.507161 4.467367 2.928548 1.998400 1.437161
2005 1.450903 1.924321 3.311677 5.550100 6.610290 6.845200 4.494806 4.971290 4.129367 3.175774 2.152400 1.380968
2006 1.739677 2.488321 3.597323 5.799500 7.027839 6.029133 4.676774 4.581097 4.003967 3.186839 1.953533 1.430806
2007 1.617290 1.955571 3.680903 5.786300 6.937806 6.233467 4.931548 4.406065 3.692767 3.232968 1.855333 1.363419
2008 1.514806 2.219750 3.838548 5.440600 5.894710 4.842933 4.754581 4.353548 4.166733 3.104355 2.019967 1.406226
2009 1.550677 2.514929 3.638194 5.886567 7.136065 7.076900 4.911968 4.482323 4.136133 3.140161 1.711367 1.310419
2010 1.145032 2.268714 3.810806 5.574067 6.637968 6.386200 4.053516 4.121645 3.644833 2.611677 1.922900 1.228548
2011 1.255613 1.973179 3.537774 5.150667 6.590258 5.667733 4.359000 3.941581 3.518067 3.164194 1.816267 1.386806
2012 1.283097 2.264071 3.590774 4.991200 6.632419 6.991933 5.337000 3.790742 3.975933 3.073032 1.884733 1.460000
2013 1.348903 2.020286 3.506968 5.135733 6.758484 6.083067 5.062032 3.960806 4.006767 2.482355 1.847067 1.308935
2014 1.315484 1.843143 3.335871 5.278733 6.376839 6.622633 5.231161 4.784194 3.803367 2.784419 2.092100 1.239129
2015 1.188645 2.071250 3.273710 5.245267 6.563032 6.364300 4.762935 4.259968 4.331767 3.093968 1.763933 1.366097
2016 1.126452 2.321750 3.454452 5.230500 6.608355 6.304333 4.480065 4.060677 3.700300 2.635129 1.917700 1.302387
2017 1.348258 2.293643 3.584710 5.840500 6.461097 5.991267 5.174258 4.423935 4.118567 2.903323 1.513767 1.316742
2018 1.476806 2.383071 3.793903 5.501733 6.465613 6.015867 4.535613 4.183742 3.864633 3.015194 1.912333 1.260548
2019 1.379097 1.921607 3.159355 5.150267 6.111903 7.160633 4.312903 4.342065 3.734133 2.758161 1.855500 1.035161
2020 1.318032 2.301500 3.000516 4.894567 6.206806 6.375333 5.183903 4.372548 4.349467 3.108161 1.807200 1.370452
2021 1.252290 2.244071 3.749000 5.163167 5.991548 5.799233 4.835871 4.747968 3.488267 3.306258 1.927867 1.208581

5.1 Trend analysis (monthly average)¶

monthly_mk_test() is a function to calculate Mann-kendall test for each month and return a dataframe which illustrates trend and other values for each month dor span of 27 years.

In [44]:
def month_mk_test(feature =None):
    monthly_avg = month_data_avg(shape_monthly,m)
    mon_mk = np.zeros((12,9), dtype='<U12')
    for i in range(12):
        mon_mk[i,:] = list(mk.original_test(monthly_avg[:,feature,i]))
    df_mon_mk = pd.DataFrame(mon_mk, index =months, 
                         columns = ["Trend", "Hypothesis", "p-value", "z-value", "Tau",
                                    "s", "Var_s", "Slope",
                                   "Intercept"])
    return df_mon_mk
In [45]:
df_mon_mk_et = month_mk_test(10) #monthly ETo trend
In [46]:
df_mon_mk_et
Out[46]:
Trend Hypothesis p-value z-value Tau s Var_s Slope Intercept
Jan no trend False 0.0729994301 -1.792834256 -0.247863247 -87.0 2301.0 -0.008207885 1.4857992831
Feb no trend False 0.3812637215 -0.875570218 -0.122507122 -43.0 2301.0 -0.003699248 2.3121616541
March no trend False 0.2602784927 -1.125733137 -0.156695156 -55.0 2301.0 -0.009354838 3.7063225806
April no trend False 0.1131106461 -1.584365157 -0.219373219 -77.0 2301.0 -0.014511904 5.6441880952
May decreasing True 0.0001251402 -3.835831432 -0.527065527 -185.0 2301.0 -0.028338709 7.0008225806
June no trend False 0.7703961609 -0.291856739 -0.042735042 -15.0 2301.0 -0.004576190 6.3638238095
July no trend False 0.6464984392 -0.458632019 -0.065527065 -23.0 2301.0 -0.006341397 4.9019543010
Aug no trend False 0.5047075455 -0.667101118 -0.094017094 -33.0 2301.0 -0.007824596 4.5077842741
Sept decreasing True 0.0045799907 -2.835179754 -0.390313390 -137.0 2301.0 -0.024231884 4.4335811594
Oct no trend False 0.0665756440 -1.834528076 -0.253561253 -89.0 2301.0 -0.008916129 3.2202645161
Nov decreasing True 0.0011454847 -3.252117953 -0.447293447 -157.0 2301.0 -0.014524074 2.1117129629
Dec decreasing True 0.0097374242 -2.585016835 -0.356125356 -125.0 2301.0 -0.007996204 1.4744022770
In [47]:
fig, axes = plt.subplots(nrows=1,ncols=2, figsize = (15,8))
fig.suptitle('Average Monthly ET from 1995 to 2021')

axes[0].set_title('Avg. monthly ET of Jan and Feb from 1995 to 2021')
axes[0].set_xlabel('Year')
axes[0].set_ylabel('Avg. monthly ET')
axes[0].plot(df_f.index,df_f['Jan'],'b.-',ms=10, label = 'Jan')
axes[0].plot(df_f.index,df_f['Feb'],'y.-',ms=10, label = 'Feb')
axes[0].legend(loc=1)

axes[1].set_title('Avg. monthly ET of Nov and Dec from 1995 to 2021')
axes[1].set_xlabel('Year')
axes[1].set_ylabel('Avg. monthly ET')
axes[1].plot(df_f.index,df_f['Nov'],'g.-',ms=10, label = 'Nov')
axes[1].plot(df_f.index,df_f['Dec'],'r.-',ms=10, label = 'Dec')
axes[1].legend(loc=1);
#axes[1].axline((0, 2.1117129629), slope=-0.014524074, color='C0', label='by slope');
In [48]:
df_mon_mk_tmax = month_mk_test(1) #monthly Tmax trend
In [49]:
df_mon_mk_tmax
Out[49]:
Trend Hypothesis p-value z-value Tau s Var_s Slope Intercept
Jan no trend False 0.7074787276 -0.375244379 -0.054131054 -19.0 2301.0 -0.014940577 17.594227504
Feb no trend False 0.2972515252 1.0423454980 0.1452991452 51.0 2301.0 0.0432330827 20.923684210
March no trend False 0.4784517390 0.7087949386 0.0997150997 35.0 2301.0 0.0276497695 26.808294930
April no trend False 0.6168456628 -0.500325839 -0.071225071 -25.0 2301.0 -0.031333333 35.250666666
May no trend False 0.4784517390 -0.708794938 -0.099715099 -35.0 2301.0 -0.020737327 39.353456221
June no trend False 0.2783473662 1.0840393179 0.1509971509 53.0 2301.0 0.0474999999 36.929166666
July no trend False 0.2972515252 1.0423454980 0.1452991452 51.0 2301.0 0.0196480938 33.751026392
Aug no trend False 0.3375799754 0.9589578582 0.1339031339 47.0 2301.0 0.0176881720 33.328118279
Sept no trend False 0.2875899350 -1.063423511 -0.148148148 -52.0 2300.0 -0.026904761 33.373095238
Oct no trend False 0.4784517390 -0.708794938 -0.099715099 -35.0 2301.0 -0.024731182 32.173118279
Nov no trend False 0.1961811327 -1.292508417 -0.179487179 -63.0 2301.0 -0.032156862 27.331372549
Dec no trend False 0.6464984392 -0.458632019 -0.065527065 -23.0 2301.0 -0.010215053 20.716666666
In [50]:
df_Tmax = pd.DataFrame(monthly[:,1,:], index = list(range(1995,2022)), columns = months)
In [51]:
fig, axes = plt.subplots(nrows=1,ncols=2, figsize = (15,8))
fig.suptitle('Average Monthly Tmax from 1995 to 2021')

axes[0].set_title('Avg. monthly Tmax of Jan and Feb from 1995 to 2021')
axes[0].set_xlabel('Year')
axes[0].set_ylabel('Avg. monthly Tmax')
axes[0].plot(df_Tmax.index,df_Tmax['Jan'],'b.-',ms=10, label = 'Jan')
axes[0].plot(df_Tmax.index,df_Tmax['Feb'],'y.-',ms=10, label = 'Feb')
axes[0].legend(loc=1)

axes[1].set_title('Avg. monthly Tmax of Nov and Dec from 1995 to 2021')
axes[1].set_xlabel('Year')
axes[1].set_ylabel('Avg. monthly Tmax')
axes[1].plot(df_Tmax.index,df_Tmax['Nov'],'g.-',ms=10, label = 'Nov')
axes[1].plot(df_Tmax.index,df_Tmax['Dec'],'r.-',ms=10, label = 'Dec')
axes[1].legend(loc=1);
#axes[1].axline((0, 2.1117129629), slope=-0.014524074, color='C0', label='by slope');
In [52]:
plt.figure(figsize=(6,4), dpi = 100)
plt.xlabel('Reference Evapotranspiration (ETo)')
for i in months:
    sns.kdeplot(df_f[i], label=i)
    plt.legend(loc=1);
In [53]:
plt.figure(figsize=(6,4), dpi = 100)
plt.xlabel('Maximum Temperature (Tmax)')
for i in months:
    sns.kdeplot(df_Tmax[i], label=i)
    plt.legend(loc=1);

5.2 Trend analysis (annual average)¶

In [54]:
df_jTmax = pd.DataFrame(annual[:,1,:], columns=range(1995,2022))
df_jTmin = pd.DataFrame(annual[:,2,:], columns=range(1995,2022))
df_jRns = pd.DataFrame(annual[:,8,:], columns=range(1995,2022))
df_jVPD = pd.DataFrame(annual[:,9,:], columns=range(1995,2022))
df_jETo = pd.DataFrame(annual[:,10,:], columns=range(1995,2022))
df_jRHmax = pd.DataFrame(annual[:,3,:], columns=range(1995,2022))
df_jRHmin = pd.DataFrame(annual[:,4,:], columns=range(1995,2022))
df_jWS = pd.DataFrame(annual[:,5,:], columns=range(1995,2022))
In [55]:
ls=[1995,2000,2005,2010,2015,2020]
plt.figure(figsize=(12,6), dpi = 100)
plt.xlabel('Maximum Temperature (Tmax)')
for i in ls:
    sns.kdeplot(df_jTmax[i], label=i)
    plt.legend(loc=1);
In [56]:
ls=[1995,2000,2005,2010,2015,2020]
plt.figure(figsize=(12,6), dpi = 100)
plt.xlabel('Reference Evapotranspiration (ETo)')
for i in ls:
    sns.kdeplot(df_jETo[i], label=i)
    plt.legend(loc=1);
In [57]:
ix = [1,2,3,4,5,8,9,10]
cl = ['Tmax','Tmin','RHmax','RHmin','Wind Speed','Rns','VPD','ETo']
an_mk = np.zeros((len(ix),9), dtype='<U12')
for i in range(len(ix)):
    an_mk[i,:] = list(mk.original_test(annual[:,ix[i],:].mean(axis=0)))
df_an_mk = pd.DataFrame(an_mk, index =cl, 
                         columns = ["Trend", "Hypothesis", "p-value", "z-value", "Tau",
                                    "s", "Var_s", "Slope",
                                   "Intercept"])

For some selected parameters, Mann-kendal trend on annual average values from 1995 to 2021 is calculated, and it is found that Tmax, RHmin, and VPD has no trend; Tmin has increasing trend; and RHmax, Rns, wind speed, and ETo has decreasing trend at 5% confidence level.

In [58]:
df_an_mk
Out[58]:
Trend Hypothesis p-value z-value Tau s Var_s Slope Intercept
Tmax no trend False 0.8024613596 -0.250162919 -0.037037037 -13.0 2301.0 -0.004257990 29.864121004
Tmin increasing True 0.0009883866 3.2938117738 0.4529914529 159.0 2301.0 0.0421575342 16.920719178
RHmax decreasing True 0.0097374242 -2.585016835 -0.356125356 -125.0 2301.0 -0.136073059 85.157990867
RHmin no trend False 0.6464984392 -0.458632019 -0.065527065 -23.0 2301.0 -0.025022831 49.023926940
Wind Speed decreasing True 0.0370973584 -2.084690996 -0.287749287 -101.0 2301.0 -0.004328584 1.1264058447
Rns decreasing True 3.6918904438 -4.628014011 -0.635327635 -223.0 2301.0 -0.066842571 14.380558904
VPD no trend False 0.2430379897 1.1674269578 0.1623931623 57.0 2301.0 0.0024511896 1.4102687815
ETo decreasing True 0.0007322794 -3.377199413 -0.464387464 -163.0 2301.0 -0.010368150 3.8874626712
In [59]:
df_j = [df_jTmax,df_jTmin,df_jRHmax,df_jRHmin,df_jWS,df_jRns,df_jVPD,df_jETo]
color = ['coral','blue','salmon','lawngreen','purple','olive','tomato','darkcyan']
fig, ax = plt.subplots(4,2, figsize=(16,12), dpi=150)
for i in range(len(df_j)):
    u =i//2
    v = i%2
    df_j[i].mean().plot(ax=ax[u][v],color=color[i], lw=2)
    ax[u][v].set_title(f'Avg. annual {cl[i]} from 1995 to 2021')
    ax[u][v].set_xlabel('Year')
    ax[u][v].set_ylabel(f'Avg. annual {cl[i]}')
plt.tight_layout();
In [60]:
#Tmax cluster map
plt.figure(dpi=150, figsize=(4,4))
sns.clustermap(df_jTmax,row_cluster=False)
Out[60]:
<seaborn.matrix.ClusterGrid at 0x1dcd853dca0>
<Figure size 600x600 with 0 Axes>

These cluster maps shows how similar are different years in terms of different parameters. The x-axis shows different years with dendrogram at the top. The smaller the length of dendrogram line more is the similarity between particular parameter values whereas y-axis depicts julian days in a year

In [61]:
#ETo cluster map
plt.figure(dpi=150, figsize=(4,4))
sns.clustermap(df_jETo,row_cluster=False)
Out[61]:
<seaborn.matrix.ClusterGrid at 0x1dcd90ebcd0>
<Figure size 600x600 with 0 Axes>

click for top of notebook

In [ ]: